85 research outputs found
Moment-Matching Polynomials
We give a new framework for proving the existence of low-degree, polynomial
approximators for Boolean functions with respect to broad classes of
non-product distributions. Our proofs use techniques related to the classical
moment problem and deviate significantly from known Fourier-based methods,
which require the underlying distribution to have some product structure.
Our main application is the first polynomial-time algorithm for agnostically
learning any function of a constant number of halfspaces with respect to any
log-concave distribution (for any constant accuracy parameter). This result was
not known even for the case of learning the intersection of two halfspaces
without noise. Additionally, we show that in the "smoothed-analysis" setting,
the above results hold with respect to distributions that have sub-exponential
tails, a property satisfied by many natural and well-studied distributions in
machine learning.
Given that our algorithms can be implemented using Support Vector Machines
(SVMs) with a polynomial kernel, these results give a rigorous theoretical
explanation as to why many kernel methods work so well in practice
Learning Graphical Models Using Multiplicative Weights
We give a simple, multiplicative-weight update algorithm for learning
undirected graphical models or Markov random fields (MRFs). The approach is
new, and for the well-studied case of Ising models or Boltzmann machines, we
obtain an algorithm that uses a nearly optimal number of samples and has
quadratic running time (up to logarithmic factors), subsuming and improving on
all prior work. Additionally, we give the first efficient algorithm for
learning Ising models over general alphabets.
Our main application is an algorithm for learning the structure of t-wise
MRFs with nearly-optimal sample complexity (up to polynomial losses in
necessary terms that depend on the weights) and running time that is
. In addition, given samples, we can also learn the
parameters of the model and generate a hypothesis that is close in statistical
distance to the true MRF. All prior work runs in time for
graphs of bounded degree d and does not generate a hypothesis close in
statistical distance even for t=3. We observe that our runtime has the correct
dependence on n and t assuming the hardness of learning sparse parities with
noise.
Our algorithm--the Sparsitron-- is easy to implement (has only one parameter)
and holds in the on-line setting. Its analysis applies a regret bound from
Freund and Schapire's classic Hedge algorithm. It also gives the first solution
to the problem of learning sparse Generalized Linear Models (GLMs)
Embedding Hard Learning Problems Into Gaussian Space
We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in theoretical computer science and show that any algorithm for agnostically learning halfspaces requires n^Omega(log(1/epsilon)) time under the assumption that k-sparse LPN requires n^Omega(k) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian.
We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al. 2013 who show that sparse polynomials are learnable under random Gaussian noise in polynomial time.
Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions
- …